Skip to content

Add BM25 k3 query-term frequency saturation to classic query parser#15818

Open
sgup432 wants to merge 3 commits intoapache:mainfrom
sgup432:query_parser_k3_saturation
Open

Add BM25 k3 query-term frequency saturation to classic query parser#15818
sgup432 wants to merge 3 commits intoapache:mainfrom
sgup432:query_parser_k3_saturation

Conversation

@sgup432
Copy link
Contributor

@sgup432 sgup432 commented Mar 13, 2026

Description

Related issue - #15768

Adds BM25 k3 parameter to the classic query parser for query-side term frequency saturation. Duplicate terms in a query string now get their boost computed as ((k3+1)*qtf)/(k3+qtf) instead of linear summing, when enabled via parser.setK3(8f).

Defaults to -1 (disabled) — fully backward compatible. Saturation is applied at parse time in QueryParserBase.getBooleanQuery() so BooleanQuery.rewrite() never sees duplicates.

Example:

Query: "pizza pizza pizza restaurant" with k3 = 8

Without k3 (current behavior — linear boost via BooleanQuery.rewrite()):

BoostQuery(TermQuery("pizza"), 3.0)  OR  TermQuery("restaurant")

With parser.setK3(8f) — saturated at parse time:

BoostQuery(TermQuery("pizza"), 2.4545)  OR  TermQuery("restaurant")

The weight 2.4545 comes from ((8+1)*3)/(8+3) = 27/11. Repeating a term gives diminishing returns instead of scaling linearly.

@github-actions github-actions bot added this to the 11.0.0 milestone Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant